{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.  \n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Azure Machine Learning Pipeline with HyperDriveStep\n",
        "\n",
        "\n",
        "This notebook is used to demonstrate the use of HyperDriveStep in AML Pipeline."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Prerequisites and Azure Machine Learning Basics\n",
        "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration Notebook](https://aka.ms/pl-config) first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n",
        "\n",
        "## Azure Machine Learning and Pipeline SDK-specific imports"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import azureml.core\n",
        "from azureml.core import Workspace, Environment, Experiment, Datastore, Dataset, ScriptRunConfig\n",
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.core.conda_dependencies import CondaDependencies\n",
        "from azureml.core.runconfig import RunConfiguration\n",
        "from azureml.exceptions import ComputeTargetException\n",
        "from azureml.pipeline.steps import HyperDriveStep, HyperDriveStepRun, PythonScriptStep\n",
        "from azureml.pipeline.core import Pipeline, PipelineData, TrainingOutput\n",
        "from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n",
        "from azureml.train.hyperdrive import choice, loguniform\n",
        "\n",
        "import os\n",
        "import shutil\n",
        "import urllib\n",
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "\n",
        "# Check core SDK version number\n",
        "print(\"SDK version:\", azureml.core.VERSION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Initialize workspace\n",
        "\n",
        "Initialize a workspace object from persisted configuration. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure the config file is present at .\\config.json"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create an Azure ML experiment\n",
        "Let's create an experiment named \"tf-mnist\" and a folder to hold the training scripts. \n",
        "\n",
        "> The best practice is to use separate folders for scripts and its dependent files for each step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step. \n",
        "\n",
        "> The script runs will be recorded under the experiment in Azure."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "script_folder = './tf-mnist'\n",
        "os.makedirs(script_folder, exist_ok=True)\n",
        "\n",
        "exp = Experiment(workspace=ws, name='Hyperdrive_sample')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Download MNIST dataset\n",
        "In order to train on the MNIST dataset we will first need to download it from Yan LeCun's web site directly and save them in a `data` folder locally."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "data_folder = os.path.join(os.getcwd(), 'data/mnist')\n",
        "os.makedirs(data_folder, exist_ok=True)\n",
        "\n",
        "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',\n",
        "                           filename=os.path.join(data_folder, 'train-images.gz'))\n",
        "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz',\n",
        "                           filename=os.path.join(data_folder, 'train-labels.gz'))\n",
        "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-images-idx3-ubyte.gz',\n",
        "                           filename=os.path.join(data_folder, 'test-images.gz'))\n",
        "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-labels-idx1-ubyte.gz',\n",
        "                           filename=os.path.join(data_folder, 'test-labels.gz'))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Show some sample images\n",
        "Let's load the downloaded compressed file into numpy arrays using some utility functions included in the `utils.py` library file from the current folder. Then we use `matplotlib` to plot 30 random images from the dataset along with their labels."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from utils import load_data\n",
        "\n",
        "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster.\n",
        "X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / np.float32(255.0)\n",
        "X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / np.float32(255.0)\n",
        "y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)\n",
        "y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)\n",
        "\n",
        "\n",
        "count = 0\n",
        "sample_size = 30\n",
        "plt.figure(figsize = (16, 6))\n",
        "for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n",
        "    count = count + 1\n",
        "    plt.subplot(1, sample_size, count)\n",
        "    plt.axhline('')\n",
        "    plt.axvline('')\n",
        "    plt.text(x = 10, y = -10, s = y_train[i], fontsize = 18)\n",
        "    plt.imshow(X_train[i].reshape(28, 28), cmap = plt.cm.Greys)\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Upload MNIST dataset to blob datastore \n",
        "A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. In the next step, we will use Azure Blob Storage and upload the training and test set into the Azure Blob datastore, which we will then later be mount on a Batch AI cluster for training."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "datastore = ws.get_default_datastore()\n",
        "datastore.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create Azure Machine Learning datasets\n",
        "By creating a dataset, you create a reference to the data source location. If you applied any subsetting transformations to the dataset, they will be stored in the dataset as well. The data remains in its existing location, so no extra storage cost is incurred."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "dataset = Dataset.File.from_files(datastore.path('mnist'))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Retrieve or create a Azure Machine Learning compute\n",
        "Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n",
        "\n",
        "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
        "\n",
        "If we could not find the compute with the given name in the previous cell, then we will create a new compute here. This process is broken down into the following steps:\n",
        "\n",
        "1. Create the configuration\n",
        "2. Create the Azure Machine Learning compute\n",
        "\n",
        "**This process will take a few minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "cluster_name = \"amlcomp\"\n",
        "\n",
        "try:\n",
        "    compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
        "    print('Found existing compute target {}.'.format(cluster_name))\n",
        "except ComputeTargetException:\n",
        "    print('Creating a new compute target...')\n",
        "    compute_config = AmlCompute.provisioning_configuration(vm_size=\"Standard_NC6s_v3\",\n",
        "                                                               max_nodes=4)\n",
        "\n",
        "    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
        "compute_target.wait_for_completion(show_output=True, timeout_in_minutes=20)\n",
        "\n",
        "print(\"Azure Machine Learning Compute attached\")\n",
        "\n",
        "cpu_cluster_name = \"cpu-cluster\"\n",
        "\n",
        "try:\n",
        "    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
        "    print(\"Found existing cpu-cluster\")\n",
        "except ComputeTargetException:\n",
        "    print(\"Creating new cpu-cluster\")\n",
        "    \n",
        "    compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n",
        "                                                           min_nodes=0,\n",
        "                                                           max_nodes=4)\n",
        "    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
        "    \n",
        "cpu_cluster.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Copy the training files into the script folder\n",
        "The TensorFlow training script is already created for you. You can simply copy it into the script folder, together with the utility library used to load compressed data file into numpy array."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# the training logic is in the tf_mnist.py file.\n",
        "shutil.copy('./tf_mnist.py', script_folder)\n",
        "\n",
        "# the utils.py just helps loading data from the downloaded MNIST dataset into numpy arrays.\n",
        "shutil.copy('./utils.py', script_folder)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Retrieve an Environment\n",
        "In this tutorial, we will use one of Azure ML's curated TensorFlow environments for training. Curated environments are available in your workspace by default. Specifically, we will use the TensorFlow 2.0 GPU curated environment."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "tf_env = Environment.get(ws, name='AzureML-tensorflow-2.12-cuda11')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Setup an input for the ScriptRunConfig step\n",
        "You can mount dataset to remote compute."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "data_folder = dataset.as_mount()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Configure the training job\n",
        "Create a ScriptRunConfig object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "src = ScriptRunConfig(source_directory=script_folder,\n",
        "                      script='tf_mnist.py',\n",
        "                      arguments=['--data-folder', data_folder],\n",
        "                      compute_target=compute_target,\n",
        "                      environment=tf_env)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Intelligent hyperparameter tuning\n",
        "Now let's try hyperparameter tuning by launching multiple runs on the cluster. First let's define the parameter space using random sampling.\n",
        "\n",
        "In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, the best validation accuracy (`validation_acc`)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ps = RandomParameterSampling(\n",
        "    {\n",
        "        '--batch-size': choice(25, 50, 100),\n",
        "        '--first-layer-neurons': choice(10, 50, 200, 300, 500),\n",
        "        '--second-layer-neurons': choice(10, 50, 200, 500),\n",
        "        '--learning-rate': loguniform(-6, -1)\n",
        "    }\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we will define an early termnination policy. The `BanditPolicy` basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML terminate the job. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric.\n",
        "\n",
        "Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we are ready to configure a run configuration object, and specify the primary metric `validation_acc` that's recorded in your training runs. If you go back to visit the training script, you will notice that this value is being logged after every epoch (a full batch set). We also want to tell the service that we are looking to maximizing this value. We also set the number of samples to 20, and maximal concurrent job to 4, which is the same as the number of nodes in our computer cluster."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "tags": [
          "hyperdriveconfig-remarks-sample"
        ]
      },
      "outputs": [],
      "source": [
        "hd_config = HyperDriveConfig(run_config=src, \n",
        "                             hyperparameter_sampling=ps,\n",
        "                             policy=early_termination_policy,\n",
        "                             primary_metric_name='validation_acc', \n",
        "                             primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n",
        "                             max_total_runs=4,\n",
        "                             max_concurrent_runs=4)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### HyperDriveStep\n",
        "HyperDriveStep can be used to run HyperDrive job as a step in pipeline.\n",
        "- **name:** Name of the step\n",
        "- **hyperdrive_config:** A HyperDriveConfig that defines the configuration for this HyperDrive run\n",
        "- **inputs:** List of input port bindings\n",
        "- **outputs:** List of output port bindings\n",
        "- **metrics_output:** Optional value specifying the location to store HyperDrive run metrics as a JSON file\n",
        "- **allow_reuse:** whether to allow reuse\n",
        "- **version:** version\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "tags": [
          "hyperdrivestep-remarks-sample"
        ]
      },
      "outputs": [],
      "source": [
        "metrics_output_name = 'metrics_output'\n",
        "metrics_data = PipelineData(name='metrics_data',\n",
        "                            datastore=datastore,\n",
        "                            pipeline_output_name=metrics_output_name,\n",
        "                            training_output=TrainingOutput(\"Metrics\"))\n",
        "\n",
        "model_output_name = 'model_output'\n",
        "saved_model = PipelineData(name='saved_model',\n",
        "                            datastore=datastore,\n",
        "                            pipeline_output_name=model_output_name,\n",
        "                            training_output=TrainingOutput(\"Model\",\n",
        "                                                           model_file=\"outputs/model/saved_model.pb\"))\n",
        "\n",
        "hd_step_name='hd_step01'\n",
        "hd_step = HyperDriveStep(\n",
        "    name=hd_step_name,\n",
        "    hyperdrive_config=hd_config,\n",
        "    inputs=[data_folder],\n",
        "    outputs=[metrics_data, saved_model])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Find and register best model\n",
        "When all the jobs finish, we can choose to register the model that has the highest accuracy through an additional PythonScriptStep.\n",
        "\n",
        "Through this additional register_model_step, we register the chosen files as a model named `tf-dnn-mnist` under the workspace for deployment."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "conda_dep = CondaDependencies()\n",
        "conda_dep.add_pip_package(\"azureml-sdk\")\n",
        "\n",
        "rcfg = RunConfiguration(conda_dependencies=conda_dep)\n",
        "\n",
        "register_model_step = PythonScriptStep(script_name='register_model.py',\n",
        "                                       name=\"register_model_step01\",\n",
        "                                       inputs=[saved_model],\n",
        "                                       compute_target=cpu_cluster,\n",
        "                                       arguments=[\"--saved-model\", saved_model],\n",
        "                                       allow_reuse=True,\n",
        "                                       runconfig=rcfg)\n",
        "\n",
        "register_model_step.run_after(hd_step)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Run the pipeline"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "pipeline = Pipeline(workspace=ws, steps=[hd_step, register_model_step])\n",
        "pipeline_run = exp.submit(pipeline)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Monitor using widget"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.widgets import RunDetails\n",
        "RunDetails(pipeline_run).show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Wait for the completion of this Pipeline run"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "pipeline_run.wait_for_completion()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Retrieve the metrics\n",
        "Outputs of above run can be used as inputs of other steps in pipeline. In this tutorial, we will show the result metrics."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "metrics_output = pipeline_run.get_pipeline_output(metrics_output_name)\n",
        "num_file_downloaded = metrics_output.download('.', show_progress=True)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import pandas as pd\n",
        "import json\n",
        "with open(metrics_output._path_on_datastore) as f:  \n",
        "    metrics_output_result = f.read()\n",
        "    \n",
        "deserialized_metrics_output = json.loads(metrics_output_result)\n",
        "df = pd.DataFrame(deserialized_metrics_output)\n",
        "df"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "For model deployment, please refer to [Training, hyperparameter tune, and deploy with TensorFlow](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/ml-frameworks/tensorflow/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb)."
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "nagaur"
      }
    ],
    "category": "tutorial",
    "compute": [
      "AML Compute"
    ],
    "datasets": [
      "Custom"
    ],
    "deployment": [
      "None"
    ],
    "exclude_from_index": false,
    "framework": [
      "Azure ML"
    ],
    "friendly_name": "Azure Machine Learning Pipeline with HyperDriveStep",
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    },
    "order_index": 8,
    "star_tag": [
      "featured"
    ],
    "tags": [
      "None"
    ],
    "task": "Demonstrates the use of HyperDriveStep"
  },
  "nbformat": 4,
  "nbformat_minor": 2
}